231 research outputs found

    Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets

    Get PDF
    Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence (NMF-KL) is one of the most significant NMF problems and equivalent to Probabilistic Latent Semantic Indexing (PLSI), which has been successfully applied in many applications. For sparse count data, a Poisson distribution and KL divergence provide sparse models and sparse representation, which describe the random variation better than a normal distribution and Frobenius norm. Specially, sparse models provide more concise understanding of the appearance of attributes over latent components, while sparse representation provides concise interpretability of the contribution of latent components over instances. However, minimizing NMF with KL divergence is much more difficult than minimizing NMF with Frobenius norm; and sparse models, sparse representation and fast algorithms for large sparse datasets are still challenges for NMF with KL divergence. In this paper, we propose a fast parallel randomized coordinate descent algorithm having fast convergence for large sparse datasets to archive sparse models and sparse representation. The proposed algorithm's experimental results overperform the current studies' ones in this problem

    Sequence-dependent histone variant positioning signatures

    Get PDF
    Background: Nucleosome, the fundamental unit of chromatin, is formed by wrapping nearly 147bp of DNA around an octamer of histone proteins. This histone core has many variants that are different from each other by their biochemical compositions as well as biological functions. Although the deposition of histone variants onto chromatin has been implicated in many important biological processes, such as transcription and replication, themechanisms of how they are deposited on target sites are still obscure. Results: By analyzing genomic sequences of nucleosomes bearing different histone variants from human, including H2A.Z, H3.3 and both (H3.3/H2A.Z, so-called double variant histones), we found that genomic sequencecontributes in part to determining target sites for different histone variants. Moreover, dinucleotides CA/TG are remarkably important in distinguishing target sites of H2A.Z-only nucleosomes with those of H3.3-containing (both H3.3-only and double variant) nucleosomes. Conclusions: There exists a DNA-related mechanism regulating the deposition of different histone variants onto chromatin and biological outcomes thereof. This provides additional insights into epigenetic regulatory mechanisms of many important cellular processes
    corecore